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ABSTRACT 



Thirty leaders of museums and libraries met at the Chicago 
Historical Society (October 5-7, 1999) to discuss common questions and 
concerns about digitization of collections and explore the ways that the 
World Wide Web is affecting their .collection-based institutions. This report 
presents the papers, under the headings of "Technology," "Audience," and 
"Collections, " that were prepared in advance of the meeting and summaries, in 
each section, of the discussions they provoked. Following an introductory 
section by Abby Smith, papers include: "Mainstreaming Digitization into the 
Mission of Cultural Repositories" (Anne R. Kenney) ; "If You Build It and They 
Come, Will They Come Back?" (Katherine P. Spiess and Spencer R. Crew) ; 
"Library Collections Online" (Abby Smith) ; and "Museum Collections Online" 
(Bernard Reilly) . The report ends with a concluding discussion and section 
outlining next steps. Appendices include a list of conference participants 
and a summary of the report, with tables and figures. (AEF) 
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Putting Culture Online 



T he World Wide Web is the brainchild of a consortium of aca- 
demics who wished to create a content-neutral medium, 
open to all as a means of communication. It did not take long 
for the Web to be colonized by the commercial sector and, even more 
quickly, by a host of self-publishers posting materials of varying val- 
ue, reliability, taste, purpose, and quality. As more and more infor- 
mation went up on the Web, public figures began to call for "quality 
content" on the Web, that is, things that have educational value and 
are created and maintained by trusted, brand-name institutions. Mu- 
seums and libraries started receiving large sums from federal agen- 
cies and foundations, as well as digging deep into their own pockets, 
to digitize their collections. 

How do museum and library collections translate into content 
on the Web? When art and research objects go from real to virtual, 
how does the relationship between object and viewer/ user change? 
And who are the users of museum and library Web sites? 

Thirty leaders of museums and libraries met at the Chicago His- 
torical Society October 5-7, 1999, to discuss these questions and ex- 
plore the ways that the World Wide Web is affecting their collection- 
based institutions. Collections , Content, and the Web was organized by 
the Council on Library and Information Resources (CLER) and the 
Chicago Historical Society (CHS) and funded by the Institute for 
Museum and Library Services (EMLS). For many who came, it was 
their first opportunity to discuss common questions and concerns 
with peers from other cultural communities. Libraries and museums 
share few professional organizations, funding agencies, or external 
structures that regularly bring them together for substantive purpos- 
by Abby Smith, es. We took as our starting point one well-defined common feature of 

Council on Library and our institutions — the fact that we have been doing business (in some 

Information Resources cases for more than 200 years) by collecting physical things in order 
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to make recorded knowledge and aesthetic experience accessible to 
our patrons. We chose to focus on three key issues — collections, audi- 
ence, and, inevitably, technology. Because we asked questions about 
the relationship between collections and audience, we commissioned 
a survey of institutional Web sites to gather preliminary data about 
how sites have been conceived and for whom, and about who actual- 
ly uses them. 

Libraries and museums come to the Web with very different ex- 
periences of information technology. Libraries have long used auto- 
mation for managing the description, cataloging, and inventory con- 
trol of collections. They had used the Internet, the backbone of 
communication on which the Web ships its information cargo, long 
before the Web was created. This does not mean that they were nec- 
essarily early adopters of the Web, any more than were museums, 
which as a rule do not have the same robust technological infrastruc- 
tures as libraries for management of their collections. On the other 
hand, museums in the last several decades have made great strides 
in making their collections more accessible to a large public and have 
developed intellectual, aesthetic, and educational portals for onsite 
visitors to their institutions. 

Over the course of two days, participants at the Chicago meeting 
not only shared experience and expertise but also created a frame- 
work for an ongoing conversation that all hope will continue as we 
find our way in the new Web environment. The differences that be- 
came apparent between the operating assumptions of library and 
museum leaders were in some cases quite predictable. Perspectives 
on intellectual property, for example, diverged because of the tradi- 
tional roles that libraries have played in the administration of fair 
use in the print world and the particular interest that museums have 
had in protecting the rights of those artists whom they display. Mu- 
seums dealt forthrightly with issues of selection and presentation 
because they have a mandate to interpret. Librarians approached the 
matter of selection in some cases as if it were synonymous with cen- 
sorship, because they traditionally place a high value on making in- 
formation accessible without mediation. But in some cases the differ- 
ences between types of museums (art or historical) and types of 
libraries (academic or public) were even more striking. In summariz- 
ing the discussions, we have tried to represent distinctly these four 
points of view — public and academic libraries, art and historical muse- 
ums — to highlight the often-surprising intersections of values and con- 
cerns and the equally unexpected divergences of interest or experience. 

This report presents the papers that were prepared in advance of 
the meeting and summaries of the discussions they provoked. It also 
includes the Web survey that CLIR commissioned from the Institute 
for Learning Innovation, which was designed to gather preliminary 
data about museum and library Web site design and use. There is no 
way that this report can capture the full flavor or content of the con- 
versations that were begun in Chicago, but we hope that it serves to 
share many of the insights that participants brought to bear on a va- 
riety of topics. Most of all, this report attempts to present the frame- 
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work in which we hope to continue the conversations so fruitfully 
begun in Chicago. 

We thank the Institute for Museum and Library Services for its 
support of the conference. The grant was part of its new effort to 
forge working partnerships between libraries and museums. We 
thank our partner, the Chicago Historical Society, which helped 
shape the program and created a hospitable atmosphere for our de- 
liberations. We are especially grateful to those who came and gave 
their time and attention to the questions we posed. Their willingness 
to engage new and often difficult questions with candor and curiosi- 
ty transformed our conjecture — that museums and libraries that digi- 
tize their collections have a lot to talk about — into a spirited and in- 
spiriting exchange. Finally, we hope, through this report, to engage 
others who identify with the concerns aired here and wish to create 
collaborative structures for putting culturally significant materials on 
the Web. 
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Technology 



Mainstreaming Digitization into the 
Mission of Cultural Repositories 



T his conference on Collections, Content, and the Web brings 
together leaders from the museum and library communities 
to consider how the Web has affected the way we go about 
fulfilling our cultural mission. In this paper, I will address four topics 
that relate this technology to institutional responsibility, opportunity, 
and cost. My underlying argument is that cultural institutions face a 
point of critical transition. Over the past decade, they have come to 
appreciate the value of digital efforts to extend their reach. They 
must now appreciate that digitization is a normal part of doing busi- 
ness — one that is worthy of commanding its share of institutional 
resources. 



Digital Collections Are Institutional Assets 

As a normal part of doing business, institutions must create and 
manage their digital collections properly to ensure their long-term 
value and utility and to protect the investment that has been made in 
them. Although no universally endorsed guidelines or standards 
have been established for digital conversion of cultural resources, 
there is a growing belief in the value of creating "digital masters" 
that are rich enough to be useful over time in the most cost-effective 
manner. This position presumes that conversion requirements will be 
set at levels that are higher than either what is necessary to meet im- 
mediate needs or what is capable of being used under current techni- 
cal environments. Michael Lesk and others have noted the economics 
of converting once (or, at least, only once a generation) and produc- 
ing a sufficiently high-level image to avoid the expense of reconvert- 
ing at a later date when technological advances either require or can 
by Anne R. Kenney , effectively use a richer digital file (Lesk 1990). This economic justifi- 

Comell University Library cation is particularly compelling given that the labor costs associated 
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with identifying, preparing, inspecting, and indexing digital infor- 
mation far exceed scanning costs. 

Institutional investments in creating high-quality digital masters 
are rewarded in the area of access and use. The library and museum 
communities are expressing a growing desire to develop cultural 
heritage resources that not only offer the broadest-possible use but 
also are comparable and interoperable across disciplines, user 
groups, and institutional types (NINCH 1999). Adopting a consistent 
approach facilitates integration between collections of images that 
artists and photographers are creating in digital form (the "bom digi- 
tal") and the "bom-again" digital files that institutions create from 
their retrospective holdings. Peter Galassi, chief curator of photogra- 
phy at the Museum of Modem Art (MoMA), suggests creating a 
high-end digital master that is "purpose blind" (Sullivan 1998). Once 
created, the archival master can then be used to create derivatives to 
meet a variety of current and future users' needs. The quality, utility, 
and expense of various derivatives (e.g., for publication, image dis- 
play, computer processing) will be directly affected by the quality of 
the initial scan. 

In addition to the arguments for the economic advantages of 
converting once and for the creation of purpose-blind masters, pres- 
ervation is the third main argument that is advanced for investing in 
rich digital masters. Digital files can be created to replace or reduce 
the use of deteriorating or vulnerable originals if the digital surro- 
gates offer accurate and trusted representations. 

But we do not decrease the preservation problem by relying on 
digital information; we only increase it. As Terry Kuny put it (1988), 
"Being digital means being ephemeral." Digital files must be created 
in a consistent and well-documented manner to make them worthy 
candidates for long-term retention. Disposition decisions should be 
based on continuing value and functionality, not limited by technical 
decisions that were made at the point of conversion or anywhere else 
along the digitization chain. We must appreciate how decisions that 
are made at the point of capture can affect our ability to manage, pre- 
serve, and use our digital collections. 

Some guiding principles for safeguarding institutional assets in- 
clude the following: 

• Invest in the selection and creation of digital resources that have 
a high probability of use and reuse over time. 

• Address preservation concerns from the ground up, including 
adequate quality capture and review; requisite metadata; and the 
use of standard, well-supported technologies. Unless these is- 
sues are addressed at the point of creation, "There is little pros- 
pect of archiving image resources that will survive technological 
change." (Ester 1996; see also Day 1998; NISO, CLIR, RLG 1999) 1 



1 Day's work focuses on requisite preservation metadata. The NISO/CLER/RLG 
initiative on standardizing metadata should provide specific preservation 
guidelines for digital image collections. 
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• Do not risk the master files by applying short-term solutions to 
short-term problems (many of today's constraints will not be to- 
morrow's, and we should avoid building an approach that be- 
comes quickly outdated or superseded). 

• Establish a social security fund for digital files from institutional 
resources (digital assets must receive perpetual care, which re- 
quires ongoing resource commitment). 



Digital Collections Increase Patron Use, Which 
Places New Demands on Cultural Repositories 

Cultural institutions experience incredible responses to digital re- 
sources that dwarf the use of their physical counterparts. The New 
York Public Library reports 10 million online hits a month, as op- 
posed to the 50,000 books served at 42nd Street, and the Library of 
Congress transmitted nearly 347 million files in the first eight 
months of 1999 (Damton 1999). These raw figures are not indicative 
of the qualitative use of this material; nonetheless, the ability to ex- 
tend exponentially access to resources is compelling, particularly 
when developed for a museum, where a very small percentage of the 
total collection is ever on view at any one time. 

Increased use is a double-edged sword, however, placing inordi- 
nate demands on resources of all kinds. Simply accommodating so 
many users requires institutions to support extremely powerful ac- 
cess systems. Peter Hirtle has noted the experience of the Church of 
Jesus Christ of Latter-day Saints, which in 1999 announced free ac- 
cess to many of its genealogical databases. Demand far exceeded ex- 
pectations. The site had been built to handle 25 million hits a day — 
five times the anticipated use level. But in the first few weeks after it 
was opened to the public, the site recorded at least 40 million hits a 
day, and another estimated 60 million hits a day were turned away 
(Church of Jesus Christ of Latter-day Saints 1999). 

A growing (and demanding) secondary clientele can tax staff re- 
sources. At Cornell, the Making of America Web site, consisting of 
19th-century journals and monographs, receives 4,000 hits a day. A 
large share of the users is made up of non-Comellians, who expect 
the digital library to act just like a regular library, replete with basic 
services. As the system becomes more stable, user requests have less 
to do with system difficulties and more to do with content inquiries, 
which often represent the interests of a general, rather than a scholar- 
ly, audience. Such questions as "What are my 1890s Harper's maga- 
zines worth?" make us feel a little more like an auction site than an 
educational site. Cornell began its digital library a decade ago under 
the rubric "any time, any place," and today must address the ques- 
tion of "anybody?" 

The issues raised by user response to digital collections lead to 
the last two points I want to address: overcoming barriers to Web use 
and financing the enterprise. 
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Institutions Must Overcome Technical Barriers to 
Effective Use on the Web 

Various user studies have concluded that all researchers expect the 
following things from displayed digital images: 

• fast retrieval, 

• acceptable quality, and 

• added functionality. 

Of course, they want many other things, too, such as the ability 
to print, to manipulate and annotate images, and to compare and 
contrast images. Increasingly, they want specialized services. In pro- 
viding digital access, conflicts inevitably arise regarding what a user 
may want, what is affordable, and what the technology can deliver. 

These expectations and inherent conflicts lead cultural institu- 
tions to confront a host of technical issues associated with quality, 
delivery, and utility that do not exist in the analog world. Unfortu- 
nately, no systematic assessment has been conducted to determine 
the cumulative effects of the total range of technological choices on 
the transmission and display of digital image material. File formats, 
compression processes, scripting routines, transfer protocols, Web 
browsers, processing capabilities, and the like combine to affect user 
satisfaction. This is particularly true when we consider the lag in 
technology adoption at the user's end. Users may think they want 
the highest quality, but they may be frustrated by how long it takes 
to download a file or may be disappointed when a beautiful color 
image displays in a largely posterized form. 

Speed of Delivery 

Speed of delivery is perhaps the major concern to users. A one-mega- 
byte file might be accessed in a tenth of a second on a fiber network 
link but will take nearly three minutes on a v.90 modem. Because 
network configurations cannot be controlled, cultural institutions 
have focused on constraining image file size to speed access. Typical- 
ly, institutions have reduced file size by limiting the resolution, or bit 
depth, or by applying compression. Each of these choices can have a 
pronounced effect on image quality. New and emerging file formats 
and highly efficient compression schemes such as Flashpix, GridPix, 
and Wavelet compression are gaining in popularity. They enable the 
delivery of large images over slow network links with little quality 
loss and offer the user the means to pan and zoom. 2 Another option 
for increasing delivery speed is to bundle images together, which 
may not increase the initial delivery speed but can facilitate "flip- 
ping" through a cache of downloaded images in rapid time. The 
most notable example of this capability is found in the use of Adobe 
System's PDF (portable document format) to view and print multi- 
page documents. Other options include the use of multi-image TIFF 



2 Institutions experimenting with new file formats and compression schemes 
include the Library of Congress, the Library of Virginia, the University of 
Michigan, the U.S. Geological Survey, the Fine Arts Museums of San Francisco 
and the University of California at Berkeley, and the Cornell Johnson Art 
Museum. 
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(tagged image file format) files, CPC (Cartesian perceptual compres- 
sion), and QuickTime movies. 

The rush to embrace these new technologies should be tempered 
by the need to protect digital assets from obsolescence. This concern 
has sparked a continuing debate within the cultural community over 
the use of compression in master image files or the adoption of pro- 
prietary formats. As John Price- Wilkin has noted, "The Internet is 
littered with 'good ideas/ particularly in the form of impressive 
plug-ins or helper applications with frighteningly short life spans." 
(in press; see also Dale 1999) 

The need to reduce file size to speed delivery may be a limited- 
term concern as broad bandwidth information pipelines and wireless 
high-speed data transfer capabilities are developed in the next 5-10 
years to support research, electronic commerce, and entertainment. 
For instance, current Federal Communications Commission (FCC) 
rules require all analog broadcasts to be phased out by the end of 
2006. The potential of digital television, in particular high-definition 
television (HDTV), to provide new and different kinds of informa- 
tion to a broad range of users — including access to digitized cultural 
resources — is tantalizing (FCC 1998). Beginning with Intemet2, the 
U.S. government is funding efforts to build the Next Generation In- 
ternet (NGI) which will link research labs and universities to high- 
speed networks that are 100 to 1,000 times faster than the current In- 
ternet. Designed to handle high volumes of information, the NGI 
will make access to digital image files very easy and access to high- 
quality audio and moving-image transfer very practical (Cohen 
1999). 

Image Quality 

Users expect digital images to offer visual quality comparable to that 
of the original. However, as has been noted, image quality may be 
reduced by the need for timely delivery. Quality can be further com- 
promised by inadequate display technologies. Because monitor reso- 
lutions are often lower than those used to create digital image files, 
readers may be presented with difficult choices. They can choose a 
complete image, which can be delivered quickly but may be illegible; 
or they can examine image details but at the price of slow delivery 
and the ability to view only a fraction of the image at any given time. 
Color appearance is most problematic. The use of different browsers, 
the transfer between color spaces, or the reliance on underpowered 
monitors may affect it. Possible solutions include the use of sophisti- 
cated file formats such as portable network graphics (PNG), which 
supports both a Web-safe palette and sRGB, a color profile designed 
to ensure color consistency across platforms. Some institutions in- 
clude gray-scale and color targets with their images to enable the 
end user to adjust the color when necessary. Others have created 
electronic targets and specified monitor settings to assist users in cal- 
ibrating their monitors. Evidence suggests, however, that few users 
take advantage of these offerings. As Michael Ester has pointed out 
(1996), "The only controls that are apt to see widespread use are 
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those that are built into applications and underlying software." I sus- 
pect, however, that because color representation is a growing con- 
cern in electronic commerce, basic solutions will be forthcoming. As 
was learned in the mail-order business, no company can afford to 
handle too many returns and exchanges that are requested because 
the color of the ordered shirt does not match the color in the cata- 
log — whether in print or on the Web. 

Functionality 

Digital image files are "dumb" files; they convey little beyond an 
electronic likeness of the original document or object. Additional 
work, which traditionally requires time-consuming descriptive cata- 
loging or manual indexing, is needed to bring intelligence to these 
files. Containing costs while keeping pace with rising user expecta- 
tions will require more automated image processing. Most of us are 
familiar with text conversion via optical character recognition (OCR) 
applications. These programs have improved tremendously, with 
error rates declining by half in the past few years because of advanc- 
es in core recognition technologies, in weighted voting, and in the 
use of automated error-reduction applications. But highly accurate 
text conversion is still an elusive goal for most handwriting, for non- 
standard scripts (such as Gothic), and for many nonroman languages 
(Dahl in press). 

Interest in computer processing extends beyond textual informa- 
tion to graphic and photographic images. Raster-to-vector conver- 
sion software shows growing promise to create manipulable images 
for some graphic materials, such as maps, satellite and aerial photo- 
graphs, architectural drawings, and engineering plans, but this capa- 
bility still does poorly on rich, continuous-tone image files. Consider- 
able research is under way in the area of content-based image 
retrieval (CBIR) to automatically extract features that characterize an 
image's appearance. Today's CBIR is based primarily on numerical 
measures of shape, color, and texture and is currently most effective 
where there is a need to retrieve information by image appearance 
(e.g., finding items of a particular color) rather than image semantics 
(e.g., pictures of children on a beach). Creative use of current capabil- 
ities can lead to retrieval either by characterizing the search in terms 
of proportion and color (e.g., a beach is 75 percent yellow, 25 percent 
blue) or by identifying a particular shape (e.g., a tiger), which will 
retrieve similar shapes and patterns that will include tigers but also 
fur coats. Because CBIR is actively being investigated, improvements 
could be rapid, but the capability to automatically retrieve images by 
a particular artist or photographs from a particular decade remains 
an elusive goal (Wu in press; Eakins in press; Lesk 1998). 

In addition to providing added functionality, we can offer auxil- 
iary features that facilitate more effective use of our collections. Con- 
sider, for instance, the success ofAmazon.com, which is due in part 
to the added capabilities to facilitate access, selection, and ordering. 
Char digitized resources will be more accessible to a broader commu- 
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nity if we provide simple online tools that extend the capabilities of 

their analog counterparts, such as the following:3 

• automated perpetual calendar, enabling a reader to key in month 
and day information (e.g., October 6) and receive a listing of all 
years in which that date falls on a particular day of the week 
(e.g., Tuesday), 

• timelines to place historical items in the context of certain events, 

• currency conversion tools that not only translate pesos into 
pounds but also peg value to their relative worth for any date in 
history, 

• metric-to-English conversion tool, 

• listing of scientific, medical, business, and cultural signs and 
symbols, 

• multilingual dictionary and translation programs for text-search- 
able material, 

• dimension tools not only to facilitate the use of digitized maps 
but also to enable the viewer to appreciate that a Diirer and a 
Dali may be of completely different scales (Handel 1995), and 4 

• lists of "sightings" in museum and auction catalogs. 



Institutions Should Not Expect to Recover Costs 
Incurred in Digitization 

No consensus has been reached about what it costs to create — much 
less maintain and make accessible — digital image files. The cost fig- 
ures that are available vary tremendously, depending on the types of 
material being scanned, the image conversion requirements, the 
hardware and software used, and the range of functions covered in 
the calculations. There is no consistent price for outsourcing image 
conversion from vendor to vendor, or even from project to project, 
that is analogous to what we experience in other conversion efforts 
such as preservation microfilming. 

We probably know the most about text scanning of disbound 
volumes, with estimates ranging from $.10 to $.30 per image for large 
production projects.5 Figures for bound volume scanning are per- 
haps twice that amount. A number of institutions have found that 
they can obtain a better product and faster production rate when 
bound items are rendered into single leaves for scanning, even when 
the costs of rebinding are included (MacIntyre and Tanner 1998; ILEJ 
1999). 



3 1 am indebted to my archival colleagues at Cornell for many of these 
suggestions. 

4 Technical development at the Blake Archives Project includes a Java applet (The 
ImageSizer) to view Blake's work on screen at the actual physical dimensions. 
Available at http://www.iath.virginia.edu/blake/. 

5 These figures have been reported by Cornell, Michigan, and JSTOR (Journal 
Storage) (see also Odlyzko 1999). The Andrew W. Mellon Foundation has funded 
a project at the University of Michigan to document the full range of costs 
associated with digitization in a production environment. The results of that 
study will be available in late 2000. 
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